Word Sense Disambiguation with Information Retrieval Technique

نویسندگان

  • Jong-Hoon Oh
  • Saim Shin
  • Yong-Seok Choi
  • Key-Sun Choi
چکیده

This paper reports on word sense disambiguation of Korean nouns with information retrieval technique. First, context vectors are constructed using contextual words in training data. Then, the words in the context vector are weighted with local density. Each sense of a target word is represented as ‘Static Sense Vector’ in word space, which is the centroid of the context vectors. Contextual noise is removed using selective sampling. A selective sampling method use information retrieval technique, so as to enhance the discriminative power. We regard training samples as indexed documents and test samples as queries. We can retrieve relevant top-N training samples for a query (a test sample) and construct ‘Dynamic Sense Vector’ using the retrieved training samples. A word sense is estimated using the ‘Static Sense Vector’ and ‘Dynamic Sense Vector’. The Korean SENSEVAL test suit is used for this experiment and our method produces relatively good results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributional Semantics Approach to Thai Word Sense Disambiguation

Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy...

متن کامل

Knowing a word by the company it keeps: Using Local Information in a Maximum Entropy Model for Word Sense Disambiguation

Word sense disambiguation (WSD) is a key problem in computational linguistics, with applications in areas such as machine translation and information retrieval. This paper describes a corpus-based method for word sense disambiguation which uses a versatile maximum entropy technique on simple local lexical features and a rich description of the syntactic context of a word to distinguish between ...

متن کامل

بررسی نقش انواع بافتار هم‌نویسه‌ها در تعیین شباهت بین مدارک

Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...

متن کامل

Word Sense Disambiguation for Cross-Language Information Retrieval

We have developed a word sense disambiguation algorithm, following Cheng and Wilensky (1997), to disambiguate among WordNet synsets. This algorithm is to be used in a cross-language information retrieval system, CINDOR, which indexes queries and documents in a language-neutral concept representation based on WordNet synsets. Our goal is to improve retrieval precision through word sense disambig...

متن کامل

UofL: Word Sense Disambiguation Using Lexical Cohesion

One of the main challenges in the applications (i.e.: text summarization, question answering, information retrieval, etc.) of Natural Language Processing is to determine which of the several senses of a word is used in a given context. The problem is phrased as “Word Sense Disambiguation (WSD)” in the NLP community. This paper presents the dictionary based disambiguation technique that adopts t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002